Metadata harvesting for content-based distributed information retrieval

نویسندگان

  • Fabio Simeoni
  • Murat Yakici
  • Steve Neely
  • Fabio Crestani
چکیده

We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative’s (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval. As in crawling, some data move toward the retrieval process, but it is statistics about the content rather than content itself; this grants more efficient use of network resources and wider scope of application. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval; this reduces the costs of content provision while promoting the simplicity, effectiveness, and responsiveness of retrieval. Overall, we argue that the approach retains the good properties of centralized retrieval without renouncing to costeffective, large-scale resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure. In particular, we define a minimal extension of the OAI protocol which supports the coordinated harvesting of full-content indices and descriptive metadata for content resources. Finally, we report on the implementation of a proof-of-concept prototype service for multimodel content-based retrieval of distributed file collections. Introduction

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Harvesting for Full-Text Retrieval

We propose an approach to Distributed Information Retrieval based on the periodic and incremental centralisation of full-text indices of widely dispersed and autonomously managed content sources. Inspired by the success of the Open Archive Initiative’s protocol for metadata harvesting, the approach occupies middle ground between: (i) the crawling of content, and (ii) the distribution of retriev...

متن کامل

Metadata Tools for Digital Motion Picture Archives

Most of the video information retrieval systems today rely on some set of computationally extracted video and/or audio features, which may be complemented with manually created annotation that is usually either arduous to create or insufficient for capturing the content. This thesis looks at the specific domain of motion pictures to identify the computational features relevant to films and, mor...

متن کامل

Media Browser: An Example of Metadata-Based Browsing

Current methods for finding relevant content, especially in media-rich web environments, suggest that metadata is critical for accurate and efficient information retrieval. We describe a Media Browser tool, which enables users to access content by visually browsing and searching metadata that is stored in a distributed fashion over the web. The basic constraint imposed by the Media Browser tool...

متن کامل

Digital Rights Management for Distributed Multimedia E- Learning Content

Nowadays a steadily increasing amount of multimedia content is generated and requires storage in digital libraries. Current research focuses on identifying user needs to make relevant information available through semantically enhanced retrieval techniques. In this context the already complex task of retrieving multimedia content on a semantic level is further complicated by right management an...

متن کامل

Practical Framework for Harvesting Standard Metadata in Digital Repository

Metadata research drastically improved the resource discovery mechanism in accessing information from a large distributed environment. Even technological capabilities permit multiple metadata schemas for standardizing the structure and content of indexing information towards an efficient resource discovery. This paper presents the issues on standard metadata in order to pursue digital repositor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2008